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3. BASICS FOR BEGINNERS 
DAY-TO-DAY USE 
A. Creating Files—The Editor 


If the users have to type a paper, a letter, or a program, how do they get the information stored in the 
machine? Most of these tasks are done with the UNIX operating system “text editor’. See ed(1) for more details. 
Since the text editor is thoroughly documented in ed(1) and explained in the “Tutorial—Text Editor” section 
of this volume, no description is provided here on how to use it. 


Throughout this section, each reference of the form name(1M), name(7), or name(8) refers to entries in 
the UNIX System Administrator’s Manual. All other references to entries of the form name(N), where “N” 
is a number (1 through 6) possibly followed by a letter, refer to entry name in section N of the UNIX System 
User’s Manual. 


A file is just a collection of information stored in the machine, this is a simplistic but adequate definition. 
The following text will describe how to make some files. To create a file called junk with text in it, do the follow- 
ing: 


ed junk (invokes the text editor) 
a (command to “ed” to add text) 
now type in 
whatever text you want ... 
(signals the end of adding text) 


The “.” that signals the end of adding text must be at the beginning of a line by itself. Do not forget it, for until 
it is typed, no other ed commands will be recognized—everything you type will be treated as text to be added. 
Also note that no system prompt appears while you are appending, inserting, or changing text while in the text 
editor. 


At this point the user can do various editing operations on the text which was typed in, such as correcting 
spelling mistakes, rearranging paragraphs, etc. Finally, the user must write the information typed into a file 
with the editor command: 


w 
The ed will respond with the number of characters it wrote into the file junk. 


Nothing is stored permanently in the junk file until the w command is used. If the user is editing a file and 
hangs up before using the w command, the changes are not stored in the working file. The data in this case 
is saved in a file called ed.hup which the user can continue working with at the next editing session. But after 
w the information is there permanently. The user can reaccess it any time by typing the following: 

ed junk 


Type a q command to quit the editor. (If you try to quit without writing, the text editor will print a “?” to remind 
you. A second q gets the user out of the text editor regardless.) Now create a second file called temp in the same 
manner. You should now have two files, junk and temp. 


B. What files are out there? 


The Is(1) command lists the names (not contents) of any of the files that the UNIX operating system knows 
about. If you type 


Is 
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the response will be 


junk 
temp 


which are indeed the two files just created. The names are sorted into alphabetical order automatically, but 
other variations are possible. For example, the command 


ist 


causes the files to be listed in the order in which they were last changed, most recent first. The —1 option gives 
a “long” listing and is used as follows: 


| 


to produce something like S$ 


-rw-rw-rw- 1 bwk bsk 41 Jul 22 02:56 junk 
-rw-rw-rw- 1 bwk bsk 78 Jul 22 12:57 temp 


The date and time is the date and time of the last change to the file. The 41 and 78 are the number of characters 
(which should agree with the numbers you got from ed). The “bwk” is the owner of the file, i-e., the person who 
created it. The “bsk” identifies the group associated with “bwk”. The “-rw-rw-rw-” determines who has permis- 
sion to read, write, or execute the file. In this case the owner, group, and others all have permission to read (r) 
and write (w). Note that there is no permission for anyone to execute (x). The first character in “-rw-rw-rw-” 
is a “-” which indicates this is a file of data. A “d” in the first character would indicate a directory. The remain- 
ing nine characters are divided into three sets of permissions. Each set consists of three characters. The three 
sets correspond to the permissions of the owner, group, and all other users. 


Options can be combined: Is —It gives the same thing as Is —1 but sorted into time order. The user can also 
name the files interested in, and Is will list the information about them only. More details can be found in Is(1). 


The use of optional arguments that begin with a minus sign (like —t and —It) is a common convention for 
UNIX programs. In general, if a program accepts such optional arguments, they precede any file name argu- 
ments. It is also vital that you separate the various arguments with spaces: Is—] is not the same as Is —1 since 
the command Is must be separated from its argument —| by a space. Try using the command both ways and 
observe the results. 


C. Printing Files 


Now that you have created a file of text, how can the file be printed so people can look at it? There are several & 
ways to print a file. One simple way to obtain a print is to use the editor, since printing is often done just before 
making changes anyway. The editor is used to print as follows: 


ed junk 
1,$p 


The ed will reply with the count of the characters in junk and then print all the lines in the file. The user can 
also be selective about the parts of a file to be printed as follows: 


ed junk 
20,35p 
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There are times when it is not feasible to use the editor for printing. For example, there is a limit on how 
big a file ed can handle (several thousand lines). Secondly, it will only print one file at a time, and sometimes 
you want to print several, one after the other. So here are a couple of alternatives. 


The simplest of all the printing programs is cat(1). The cat simply prints on the terminal the contents of 
all the files named and in the order listed. Thus the files are concatenated and printed. For example: 


cat junk 
prints one file, and 
cat junk temp 
prints two files. The files are simply concatenated onto the terminal. 

The pr(1) command produces formatted printouts of files. As with cat, pr prints all the files named in a 
list. The difference is that it produces headings with date, time, page number, and file name at the top of each 
page, and extra lines to skip over the fold in the paper. Thus, 

pr junk temp 
will print junk neatly, then skip to the top of a new page and print temp neatly. 
The pr can also produce multicolumn output. Inputting 
pr —3 junk 


prints junk in 3-column format. You can use any reasonable number in place of “3” and pr will do its best. The 
pr command has other capabilities also. See pr(1) for more information. 


It should be noted that pr is not a formatting program in the sense of shuffling lines around and justifying 
margins. The true formatters are nroff and troff, which we will get to in the section on document preparation. 


There are also programs that print files on a hard copy printer. See Ip(1) for more information. 
D. Moving Files Around 


The user is ready for bigger things after gaining experience in creating and printing files. For example, the 
user can move a file from one place to another (which amounts to giving it a new file name), like this: 


mv junk precious 


This means that what used to be named junk is now named precious. An Is(1) command would now result in 
the following: 


precious 
temp 


The contents of junk are now in precious. Notice that the junk file no longer exists. Beware that if you move 
a file to another one that already exists, the already existing file contents are lost forever. 


If you want to make a copy of a file (i.e., to have two versions of something), use the ep(1) command as fol- 
lows: 


cp precious temp1 
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This makes a duplicate copy of precious in templ1. 


When you are finished creating and moving files, the files can be removed from the file system by the rm(1) 
command. The command is used as follows: 


rm temp templ 


This will remove both the temp and temp!] files. 


The user will get a warning message if one of the named files is not there, but otherwise, rm like most UNIX 
system commands does its work silently. There is no prompting or response, and error messages are occasionally 
shortened. This terseness is sometimes disconcerting to newcomers, but experienced users find it desirable. 


E. What's in a File Name 


So far we have used file names without ever saying what is a legal name, so it is time for a couple of rules. 
First, file names are limited to 14 characters, which is enough to be descriptive. Second, although any character 
can be used in a file name, common sense dictates sticking to ones that are visible and avoiding characters that 
could be used with other meanings. We have already seen, for example, that in the Ils(1) command, Is —t means 
to list in time order. So if a file existed whose name was —t, you would have a tough time listing it by name. 
Besides the minus sign, there are other characters which have special meaning. To avoid pitfalls, use only let- 
ters, numbers, and the period until you are familiar with the situation. 


On to some more positive suggestions. Suppose you are typing a large document like a book. Logically, this 
divides into many small pieces, like chapters and perhaps sections. Physically, it must be divided too, for ed 
will not handle really big files. Thus the document should be typed as a number of files. One possible method 
is to have a separate file for each chapter as follows: 


chapl 
chap2 
aL Giga 


Another method is breaking each chapter into several files as follows: 
chapl.1 


chap1.2 
chap1.3 


chap2.1 
chap2.2 


It can now be determined at a glance where a particular file fits into the whole. 


There are advantages to a systematic naming convention which are not obvious to the novice UNIX system 
user. To print the whole book, the user could enter the following: 


pr chapl.a chap1.2 chap1.8 ... 


Using the pr(1) command like this would be tiring and possibly lead to making mistakes. Fortunately, there 
is a shortcut. The user can enter: 


pr chap* 


The * means “anything at all”, so this translates into “print all files whose names begin with chap listed in al- @ 
phabetical order”. 
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This shorthand notation is not a property of the pr command by the way. It is system-wide, a service of 
the program that interprets commands—the “shell”, sh(1). The files in the book can be listed by using 


ls chap* 
which produces the following: 

chap1.1 

chap1.2 

chap1.3 
The * is not limited to the last position in a file name. The * can be used anywhere and can occur several times. 
Thus entering 

rm *junk* *temp* 


removes all files that contain junk or tempas any part of their name. As a special case, * by itself matches every 
file name, so 


pr * 
prints all your files (alphabetical order), and 
rm * 


removes all files. (Before using the rm * command, make sure all files are not needed!) 


The * is not the only pattern-matching feature available. To print only chapters 1 through 4 and 9, use the 
following command: 


pr chap[12349]* 


The [...] means to match any of the characters inside the brackets. A range of consecutive letters or digits can 
be abbreviated as follows: 


pr chap[1-49]* 


Letters can also be used within brackets. The [a-z] pattern-matching feature matches any character in the 
range a through z 


The ? pattern matches any single character, so 
Is ? 
lists all files which have single-character names, and 


ls —1 chap?.1 
lists information about the first file of each chapter chap1.1, chap2.1, etc. 


Of these niceties, * is certainly the most useful to become familiar with. The others are frills, but worth 
knowing. 


If the special meaning of *, ?, etc., needs to be turned off enclose the entire argument in single quotes as 
follows: 


ae 
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Some examples of this will be shown in the following paragraphs. 
F. What's in a File Name, Continued 


When the file called junk is first created, how did the system know that there was not another junk some- 
where else, especially since the person in the next office could also be reading this tutorial? The answer is that 
generally each user has a private directory, which contains only the files that belong to that particular user. 
When you log in, you are “in” your directory. Unless the user takes special action when creating a new file, the 
new file is made in the directory that the user is currently in. This is most often your own directory, and thus 
the file is unrelated to any other file of the same name that might exist in another (someone else’s) directory. 


The set of all files is organized into a (usually big) tree with your files located several branches into the 
tree. It is possible for you to “walk” around this tree and find any file in the system by starting at the root of 
the tree and walking along the proper set of branches. Conversely, the user can start at their present location 
and walk toward the root. 


Try the latter first. The basic tool is the command pwd(1) (print working directory) which prints the name 
of the directory the user is currently in. 


Although the details will vary according to the system the user is on, the pwd(1) command will print some- 
thing like: 


/usr/your_name 
This message indicates that the user is currently in the directory your_name, which is in turn in the directory 
/usr, which is in turn in the root directory called by convention just / (Even if it is not called /usr on your sys- 
tem, the message will be something analogous. Make the corresponding changes and read on.) 
If user now types 


ls /usr/your_name 


the results should be exactly the same list of file names as obtained from a plain Is(1). With no arguments, Is 
lists the contents of the current directory. Given the name of a directory, it lists the contents of that directory. 


Next, try using the following command: 
ls /usr 


This should print a long series of names, among which is your own login name your_name. On many systems, 
usr is a directory that contains the directories of all the normal users of the system. 


The next step is to try the following: 
Is / 
The response should be something like this (although again the details may be different): 
bin 
dev 


etc 
lib 
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tmp 
usr 


This is a collection of the basic directories of files that the system knows about; we are at the root of the tree. 
If junk is still in your directory, enter the following: 
cat /usr/your_name/junk 
The name 
/usr/your_name/junk 


is called the pathname of the file that is normally thought of as junk. The pathname represents the full name 
of the path as followed from the root through the tree of directories to get to a particular file. It is a universal 
rule in the UNIX operating system that anywhere an ordinary file name can be used, the pathname can also 
be used. 


This is not too exciting if all the files of interest are in your own directory; but if you work with someone 
else or on several projects concurrently, it becomes handy indeed. For example, your friends can print your book 
by entering the following: 


pr /usr/your_name/chap* 
Similarly, you can find out what files your neighbor has by entering: 
Is /usr/neighbor 


The “neighbor” just entered represents the login name of your neighbor. A copy of one of your neighbor's files 
can be made as follows: 


cp /usr/neighbor/his_file your_file 


If a file owner does not want someone else to have access to the owner’s files, or vice versa, privacy can be 
arranged. Each file and directory has read-write-execute (rwx) permissions for the owner, a group, and everyone 
else, which can be set to control access. See Is(1) and chmod(1) for details. As a matter of observed fact, most 
users find openness of more benefit than privacy most of the time. 


As a final experiment with pathnames, try the following: 
Is /bin /usr/bin 


Do some of the names look familiar? When a program is run, by typing its name after the prompt character, 
the system simply looks for a file of that name. It normally looks first in your directory (where it typically does 
not find it), then in /bin and finally in /usr/bin. There is nothing magic about commands like cat(1) or 1s(1), 
except that they have been collected into a couple of places to be easy to find and administer. 


Two or more users can work regularly with common information in a friend’s directory. This is accom- 
plished by logging in as your friend. If you are already logged in as yourself and want to work in a friend’s files, 


you could hang up and log in again as your friend. Another method is to simply change the current working 
directory as follows: 


ed /usr/your_friend 


Now when a file name is used in something like cat(1) or pr(1), the command refers to the file in your friend’s 
directory. Changing directories does not affect any permissions associated with a file. If you could not access 
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a file from your own directory, SBanemE: to another directory will not alter that fact. Of course, if you forget 
what directory you are in, type 


pwd 


to find out. S 


It is usually convenient to arrange your own files so that all the files related to one thing are in a directory 
separate from other projects. For example, when writing your book, the user might want to keep all the text 
in a directory called book. A directory can be made using the mkdir(1) command. The book directory is made 
as follows: 


mkdir book 


The book directory can now be accessed to input chapters as follows: 
ed book 

If you logged in as yourself, the pathname of book is: 
/usr/your_name/book 

To remove the book directory, type: 


rm book/* 
rmdir book 


or 
rm —r book 3 


The rm book/* command removes all files in the book directory. The rmdir book command is then used to 
remove the empty directory. The book directory must be empty before the rmdir command will work. The 
rm —r book command recursively deletes the entire contents of the book directory and then removes the book 
directory itself. 


The user can go up one level in the tree of files by entering: 


cd .. 


fieeed 


The “..” is the name of the parent of whatever directory you are currently in. For completeness, “.” is an alter- 
nate name for the directory you are in. 


G. Using Files Instead of the Terminal oF) 


Most of the commands used so far produce output on the terminal. Other commands, like the editor, take 
input from the terminal. It is universal in UNIX systems that the terminal can be replaced by a file for either 
or both of input and output. As one example, 


ls 
makes a list of files on your terminal. But if the user enters & 
Is >filelist 
a list of your files will be placed in the file filelist (which will be created if it does not already exist or overwritten @ 
if it does). The symbol > means “put the output on the following file, rather than on the terminal”. Nothing 
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is produced on the terminal. As another example, the user could combine several files into one by capturing the 
output of cat in a file: 


cat fl f2 {8 >temp 
Another symbol, that operates very much like > does, is >>. The >> means “add to the end of”. That is, 
cat fl f2 {£3 >>temp 


means to concatenate fl, f2 and f3to the end of whatever is already in temp, instead of overwriting the existing 
contents. As with >, if temp does not exist, it will be created. 


In a similar way, the symbol < means to take the input for a program from the following file, instead of 
from the terminal. Thus, the user could make up a script of commonly used editing commands and put them 
into a file called script. The script could then be run on a file by entering: 


ed file <script 


Another example is using ed to prepare a letter in file Jet. The letter (file Jet) could then be sent to several people 
as follows: 


mail adam eve mary joe <let 
H. Pipes 
One of the novel contributions of the UNIX operating system is the idea of a pipe. A pipe is simply a way 
to connect the output of one program to the input of another program, so the two run as a sequence of 
processes—a pipeline. 
For example, 


prfgh 


will print the files f g, and h, beginning each on a new page. Instead of printing the files separately, the files 
can be printed together as follows: 


cat fg h >temp 
pr <temp 
rm temp 


This method is more work than necessary. To take the output of cat and connect it to the input of pr, use the 
following pipe: 


cat f gh} pr 


The vertical bar} means to take the output from cat, which would normally have gone to the terminal and put 
it into pr to be neatly formatted. 


There are many other examples of pipes. For example, 
Is{ pr —-3 


prints a list of your files in three columns. The program we(1) counts the number of lines, words, and characters 
in its input; and as seen earlier, the who(1) command prints a list of users currently logged on the system, one 
per access port. Thus 


who! we —1 
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tells how many people are logged on. And of course € 


Is | we —] 
counts your files. 


Most programs that read from the terminal can read from a pipe instead. Most programs that write on the Ss 
terminal can write on a pipe instead. There can be as many commands in a pipeline as desired. 


Many UNIX operating system programs are written to take input from one or more files if file arguments 


are given. If no arguments are given, the programs will read from the terminal, and thus can be used in pipelines. 
One example using the pr(1) command to print files a, b, and cin three columns and in the order specified is 


as follows: 
pri8abe tie 


cat abet pr —-3 


But in 


the pr prints the information coming down the pipeline, still in three columns. 
|. The Shell 

The mysterious “shell” mentioned previously is actually the sh(1) command. The shell is the program that 
interprets what is typed as commands and arguments. The shell also looks after translating *, etc., into lists 
of file names, and <, >, and! into changes of input and output streams. 

The shell has other capabilities too. For example, the user can run two programs with one command line no 
by separating the commands with a semicolon. The shell recognizes the semicolon and breaks the line into two 
commands. Thus 

date; who 
does both commands before returning with a prompt character. 

More than one program can run simultaneously if desired. This is beneficial when doing something time- 

consuming, like using the editor script. The act of running programs simultaneously prevents waiting around 


for the results before starting something else. An example follows: 


ed file <script & 


The ampersand at the end of a command line means “start this command running, then take further commands 
from the terminal immediately”, that is, do not wait for it to complete. Thus the script will begin, but the user 
can do something else at the same time. Of course, to keep the output from interfering with what you are doing 
on the terminal, it would be better to enter 


ed file <script >script.out & 


which saves the output lines in a file called script.out. 


When a command is initiated with &, the system replies with a number called the process number. Pro- 
grams running simultaneously can be terminated as follows: 


kill process_number & 
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The process number is used to identify the command to be stopped. If you forget the process number, the ps(1) 
command will list the process number for all programs you are running. (Entering kill 0 will kill all your pro- 
cesses.) And if you are curious about other people, ps —a will provide information about all active programs 
that other users are currently running. 

To start three commands that will execute in the order specified and in the background, enter the following: 

(command_1l; command_2; command_3) & 
A background pipeline can be started as follows: 
command_1! command_2 & 

Just as the editor or some similar program can get its input from a file instead of from the terminal, the 
shell can read a file to get commands. For instance, suppose the user wants to perform a sequence of actions 
after every log in such as: 

e Set the tabs on the terminal 
e Find out the date 


e Find out who is on the system. 


The three necessary commands to perform these actions [tabs(1), date(1), and who(1)] could be put in a file 
called startup. The startup file would then be run as follows: 


sh startup 


This instruction commands the machine to run the shell with the file startup as input. The effect is the same 
as typing the contents of startup on the terminal. 


If this is to be a regular thing, the need to type sh every time can be eliminated by typing the following 
command only once: 


chmod +x startup 
To run the sequence of commands thereafter, the user only needs to enter: 
startup 


The chmod(1) command marks the file as being executable. The shell recognizes this and runs it as a sequence 
of commands. 


If the user wants startup to run automatically for every log in, create a file in your login directory called 
.profile and place in it the line “startup”. Upon logging in, the shell gains control and executes the commands 
found in the .profile file. We will get back to the shell in the section on programming. 

DOCUMENT PREPARATION 


UNIX operating systems are used extensively for document preparation. There are two major formatting 
programs, that is, programs that produce a text with justified right margins, automatic page numbering and 
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titling, automatic hyphenation, etc. The nroff program is designed to produce output on terminals and line- 
printers. The troff (pronounced “tee-roff”) program was designed to drive a phototypesetter, which produces 
very high quality output on photographic paper. This document was formatted with nroff. 


A. Formatting Packages 


The basic idea of nroff (See troff for more information.) and troff(1) is that the text to be formatted con- 
tains within it “formatting commands” that indicate in detail how the formatted text is to look. For example, 
there may be commands that specify how long lines are, whether to use single or double spacing, and the running 
titles to use on each page. 


Because nroff and troff are relatively hard to learn to use effectively, several “packages” of canned format- 
ting requests are available to let you specify paragraphs, running titles, footnotes, multicolumn output, etc., 
with little effort and without having to learn nroff and troff. These packages take a modest effort to learn, 
but the rewards for using them are so great that it is time well spent. 


This section provides a brief description of the “memorandum macros” package known as mm(1). Format- 
ting requests typically consist of a period and two uppercase letters, such as 


TL 
which is used to introduce a title, or 
nf 
to begin a new paragraph. 
The text of a typical document is entered so it looks something like this: 


TL 

title 

.AU “author information” 
-MT “memorandum type” 
P 

Enter text --- 

P 

More text --- 


SG “signature” 

The lines that begin with a period are the formatting macro requests. For example, .P calls for starting a new 
paragraph. The precise meaning of .P depends on the output device being used (typesetter or terminal, for in- 
stance) and the publication the document will appear in. For example, —-mm normally assumes that a paragraph 
is preceded by a space—one line in nroff and one-half line in troff with the first word indented. These rules 
can be changed if desired, but they are changed by changing the interpretation of .P, not by retyping the docu- 
ment. 

To actually produce a document in standard format using —mm, use the command 
troff —mm files ... 


for the typesetter, and 


nroff —mm files ... 
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for a terminal. The —-mm argument tells troff and nroff to use the manuscript package of formatting requests. 
There are several similar packages; check’ with a local expert to determine which ones are in common use on 
your machine. 


B. Supporting Tools 


In addition to the basic formatters, there is a host of supporting programs that help with document prepara- 
tion. The list in the next few paragraphs is far from complete, so browse through the UNIX System User’s 
Manual and check with UNIX operating system users for other possibilities. 


Both eqn(1) and neqn (See eqn for more information.) programs let you integrate mathematics into the 
text of a document in an easy-to-learn language that closely resembles the way you would speak it aloud. For 
example, the eqn input 


sum from i=0 to n x subi ~ =~ pi over 2 


produces the output 


The program tbl(1) provides an analogous service for preparing tables. The tbl program does all the compu- 
tations necessary to align complicated columns with elements of varying widths. 


The spell(1) program detects possible spelling mistakes in a document. The spell program compares the 
words in your document to a dictionary (stored in memory) and prints those words that are not in the dictionary. 
It knows enough about English spelling to detect plurals and the like, so it does a good job. 


The grep(1) program looks through a set of files for lines that contain a particular text pattern (rather 
like the editor’s context search does, but on a bunch of files). For example, 


grep ‘ing$’ chap* 


will find all lines that end with the letters ing in the files chap* The “$” indicated that the pattern to search 
for is at the end of the line, whereas a “ * ” indicates that the pattern to search for is at the beginning of a line. 
(It is almost always a good practice to put single quotes around the pattern to be searched for in case it contains 
characters like * or $ that have a special meaning to the shell.) The grep program is often used to locate the 
misspelled words detected by the spell program. 


The diff(1) program prints a list of the differences between two files, so that two versions of something 
can automatically be compared. This is a vast improvement over proofreading by hand. 


The we(1) program counts the words, lines, and characters in a set of files. The tr(1) program translates 
characters into other characters. For example, tr will convert uppercase to lowercase and vice versa. This trans- 
lates uppercase into lowercase: 


tr [A-Z] [a-z] <input >output 


The sort(1) program sorts files in a variety of ways while exref(1) makes cross-references. The ptx(1) pro- 
gram makes a permuted index (keyword-in-context listing). The sed(1) program provides many of the editing 
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facilities of ed but can apply them to arbitrarily long inputs. The awk(1) program provides the ability to do 
both pattern matching and numeric computations and to conveniently process fields within lines. These pro- 
grams are for more advanced users, and they are not limited to document preparation. Put them on your list 
of things to learn. 


Most of these programs are either independently documented, in the supplemental package like eqn(1) and 
tbl(1) in the UNIX System Document Processing Guide, or the programs are sufficiently simple enough so that 
the description in the UNIX System User’s Manual is an adequate explanation. 


C. Hints for Preparing Documents 


Most documents go through several versions (always more than expected) before they are finally finished. 
Accordingly, you should do whatever possible to make the job of changing them easy. 


First, when you do the purely mechanical operations of typing, type so that subsequent editing will be easy. 
Start each sentence on a newline. Make lines short, and break lines at natural places, such as after commas and 
semicolons, rather than randomly. Since most people change documents by rewriting phrases and adding, delet- 
ing, and rearranging sentences, these precautions simplify any editing needed later. 


Keep the individual files of a document down to modest size, perhaps 10 to 15 thousand characters. Larger 
files edit more slowly. If a dumb mistake is made, it is better to clobber a small file than a big one. Split the 
files at natural boundaries in the document for the same reasons that you start each sentence on a newline. 


The second aspect of making changes to documents easy is not to commit to the formatting details too early. 
One of the advantages of formatting packages is permitting format decisions to be delayed until the last possible 
moment. Indeed, until a document is printed, it is not even decided whether it will be typeset or printed out on 
a line printer. 


As arule of thumb, a document should be produced in terms of a set of requests or commands (macros) for 
all but the most trivial jobs. The macros used should then be defined either by using one of the existing macro 
packages (the recommended way) or by defining your own nroff and/or troff macros. As long as the text is 
entered in some systematic way, it can always be cleaned up and formatted by a judicious combination of editing 
commands and macro definitions. 


D. Programming 


There will be no attempt made to teach any of the programming languages available but a few words of 
advice are in order. One of the reasons why the UNIX operating system is a productive programming environ- 
ment is that there is already a rich set of tools available. Facilities like pipes, input/output redirection, and the 
capabilities of the shell often make it possible to do a job by pasting together programs that already exist in- 
stead of writing a program completely from scratch. 


E. Shell Programming 


The pipe mechanism lets you fabricate quite complicated operations out of spare parts that already exist. 
For example, the first draft of the spell program was (roughly) 


Cais collect the files 

Pte. put each word on a newline 
Pars. delete punctuation, ete. 

1 sort into dictionary order 

1 uniq diseard duplicates 

! comm print words in text 


but not in dictionary 
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More pieces have been added subsequently, but this goes a long way for such a small effort. 


The editor can be made to do things that would normally require special programs on other systems. For 
example, to list the first and last lines of each of a set of files, such as a book, the user could laboriously type: 


The same job can be performed much more easily. One procedure is to type 
Is chap* >temp 
to get the list of file names into a file called temp. The tempfile is then edited using global commands as follows: 
1,$38/*.*$/e & \ 
1p\ 
$p/ 
The results are written into the script file (1,$ w script) and then the following command is entered: 


ed <script 


This will produce the same output as the laborious hand typing. Another method is using shell loops to repeat 
a set of commands over and over again for a set of arguments as illustrated below: 


for i in chap* 
do 

ed $i <script 
done 


This sets the shell variable i to each file name in turn, then does the command. This command can be entered 
at the terminal or put in a file for later execution. Before the file can be executed, it may be necessary to change 
the mode by entering the following: 
chmod +x filename 

F. Programming with Shell 

An option often overlooked by new users is that the shell is itself a programming language, with variables, 
control flow if-else, while, for, case, subroutines, and interrupt handling. Since there are many building- 
block programs, the user can sometimes avoid writing a new program merely by piecing together some of the 


building blocks with shell command files. 


We will not go into any details here; examples and rules can be found in the “Introduction to Shell” described 
later in this volume. 
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G. Programming in C 


The C language is a reasonable choice of a programming language when undertaking anything substantial. 
Everything in the UNIX operating system is based on C language. The system itself is written in C, as are most 
of the programs that run on the system. The C language is also an easy language to use once you get started. 
The C language is introduced and fully described in The C Programming Language by B. W. Kernighan and D. 
M. Ritchie (Prentice-Hall, 1978). Several sections of the manual describe the system interfaces, that is, how to 
do input/output and similar functions. 


Most input and output in C is best handled with the standard input/output library, which provides a set 
of I/O functions that exist in compatible form on most machines that have C compilers. In general, it’s wisest 
to confine the system interactions in a program to the facilities provided by this library. (Refer to Section 3 


of the UNIX System User’s Manual.) 


The C programs that do not depend too much on the special features of the UNIX operating system (such 
as pipes) can be moved to other computers that have C compilers. The list of such machines grows daily; in addi- 
tion to the PDP*-11, it currently includes Honeywell 6000, IBM 370, Interdata 8/32, Data General Nova and 
Eclipse, HP 2100, Harris /7, VAX*-11/780, Western Electric 3B20 and 3B5, and Zilog Z80. Calls to the standard 
I/O library will work on all of these machines. 


There are a number of supporting programs that go with C. The lint(1) program checks C programs for po- 
tential portability problems and detects errors such as mismatched argument types and uninitialized variables. 


For larger programs (anything whose source is on more than one file), the make(1) program allows you 
to specify the dependencies among the source files and the processing steps needed to make a new version, The 
program then checks the times that the pieces were last changed and does the minimal amount of recompiling 
to create a consistent updated version. 


The debugger sdb(1) program is useful for digging through the dead bodies of C programs but is rather 
hard to learn to use effectively. The most effective debugging tool is still careful thought, coupled with judi- 
ciously placed print statements. 


The C compiler provides a limited statistical service, so a user can find where programs spend their time 
executing and what parts of a program are worth optimizing. Compile the programs with the —p option; after 
the test run, use prof(1) command to print a program execution profile. The command time(1) will give the 
gross run-time statistics of a program, but the times are not very accurate or reproducible. 


H. Other Languages 


If Fortran must be used, there are two possibilities—Fortran 77 and ratfor. The user might consider ratfor 
which provides decent control structures and free-form input that characterize C, yet permits the writing of 
code that is also portable to other environments. Bear in mind that UNIX operating system Fortran tends to 
produce large and relatively slow-running programs. Furthermore, supporting software like prof(1), etc., are 
all virtually useless with Fortran programs. If there is a Fortran 77 compiler on your system, it may be a viable 
alternative to ratfor and has the nontrivial advantage that it is compatible with the C language and related 
programs. (The ratfor processor and C tools can be used with Fortran 77 too.) 


If your application requires translating a language into a set of actions or another language, the user is in 
effect building a compiler, though probably a small one. In that case, the yace(1) compiler-compiler is recom- 
mended for use, which aids indeveloping a compiler quickly. The lex(1) lexical analyzer generator does the same 
job for the simpler languages that can be expressed as regular expressions. It can be used by itself or as a front 


* Trademark of Digital Equipment Corporation 
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end to recognize inputs for a yace-based program. Both yacc and lex require some sophistication to use, but 
the initial effort of learning them can be repaid many times over in programs that are easy to change later. 
GLOSSARY 


argument— Words following the command on a command line that provide information necessary to execute 
a program. 


background—A mode of program execution when the shell does not wait for the command to terminate before 
prompting for another command. 


command—tThe first word of a command line. It is the name of an executable program. 

command line—A request typed in by a user. 

current working directory—The current point of reference for accessing data within the file system. 
kill character—The character which is used to delete the current line, by default the kill character is @. 
directory—A file system file type that is used to group and organize files and other directories. 


erase character—The character which is used to delete the previous character on the current line. The default 
erase character is #. 


file—A file system file type used to store information. 


foreground—A mode of program execution when the shell waits for the command to terminate before prompt- 
ing for another command. 


full pathname—The pathname of a specific file starting from the root directory. 


group identification number (gid)—A unique number assigned to one or more logins that is used to identify 
groups of related users. 


HOME — Another name for the login directory. 

login—A means by which a user can gain access to the UNIX operating system. 

login name—A unique string of letters and numbers used to identify a login. 

mode—In reference to files, protection. 

parent directory—The directory immediately above another directory. 

partial pathname—The pathname between the current working directory and a specific file. 


pathname—A sequence of directory names separated by the / character and ending with the name of a file. 
The pathname defines the connection path between some directory and a file. 


process—A program that is in some state of execution. 
program—Software that can be executed by a user. 


shell—A UNIX operating system program that handles the communication between the system and users. The 
shell accepts commands and causes the appropriate program to be executed. 
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user identification number (uid)—A unique number assigned to each login that is used to identify users 
and the owner of information stored on the system. 
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